ar environment
Transcending Dimensions using Generative AI: Real-Time 3D Model Generation in Augmented Reality
Behravan, Majid, Haghani, Maryam, Gracanin, Denis
Traditional 3D modeling requires technical expertise, specialized software, and time-intensive processes, making it inaccessible for many users. Our research aims to lower these barriers by combining generative AI and augmented reality (AR) into a cohesive system that allows users to easily generate, manipulate, and interact with 3D models in real time, directly within AR environments. Utilizing cutting-edge AI models like Shap-E, we address the complex challenges of transforming 2D images into 3D representations in AR environments. Key challenges such as object isolation, handling intricate backgrounds, and achieving seamless user interaction are tackled through advanced object detection methods, such as Mask R-CNN. Evaluation results from 35 participants reveal an overall System Usability Scale (SUS) score of 69.64, with participants who engaged with AR/VR technologies more frequently rating the system significantly higher, at 80.71. This research is particularly relevant for applications in gaming, education, and AR-based e-commerce, offering intuitive, model creation for users without specialized skills.
- North America > United States > Virginia > Montgomery County > Blacksburg (0.04)
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
From Voices to Worlds: Developing an AI-Powered Framework for 3D Object Generation in Augmented Reality
Behravan, Majid, Gracanin, Denis
This paper presents Matrix, an advanced AI-powered framework designed for real-time 3D object generation in Augmented Reality (AR) environments. By integrating a cutting-edge text-to-3D generative AI model, multilingual speech-to-text translation, and large language models (LLMs), the system enables seamless user interactions through spoken commands. The framework processes speech inputs, generates 3D objects, and provides object recommendations based on contextual understanding, enhancing AR experiences. A key feature of this framework is its ability to optimize 3D models by reducing mesh complexity, resulting in significantly smaller file sizes and faster processing on resource-constrained AR devices. Our approach addresses the challenges of high GPU usage, large model output sizes, and real-time system responsiveness, ensuring a smoother user experience. Moreover, the system is equipped with a pre-generated object repository, further reducing GPU load and improving efficiency. We demonstrate the practical applications of this framework in various fields such as education, design, and accessibility, and discuss future enhancements including image-to-3D conversion, environmental object detection, and multimodal support. The open-source nature of the framework promotes ongoing innovation and its utility across diverse industries.
- North America > United States (0.28)
- Europe > Sweden (0.14)
- Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.85)
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.57)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)
Generative AI Framework for 3D Object Generation in Augmented Reality
This thesis presents a framework that integrates state-of-the-art generative AI models for real-time creation of three-dimensional (3D) objects in augmented reality (AR) environments. The primary goal is to convert diverse inputs, such as images and speech, into accurate 3D models, enhancing user interaction and immersion. Key components include advanced object detection algorithms, user-friendly interaction techniques, and robust AI models like Shap-E for 3D generation. Leveraging Vision Language Models (VLMs) and Large Language Models (LLMs), the system captures spatial details from images and processes textual information to generate comprehensive 3D objects, seamlessly integrating virtual objects into real-world environments. The framework demonstrates applications across industries such as gaming, education, retail, and interior design. It allows players to create personalized in-game assets, customers to see products in their environments before purchase, and designers to convert real-world objects into 3D models for real-time visualization. A significant contribution is democratizing 3D model creation, making advanced AI tools accessible to a broader audience, fostering creativity and innovation. The framework addresses challenges like handling multilingual inputs, diverse visual data, and complex environments, improving object detection and model generation accuracy, as well as loading 3D models in AR space in real-time. In conclusion, this thesis integrates generative AI and AR for efficient 3D model generation, enhancing accessibility and paving the way for innovative applications and improved user interactions in AR environments.
- Europe > Switzerland (0.04)
- North America > United States > Virginia > Montgomery County > Blacksburg (0.04)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- (3 more...)
- Research Report > Experimental Study (1.00)
- Overview (1.00)
- Research Report > New Finding (0.93)
- Research Report > Promising Solution (0.67)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (0.92)
- Education > Educational Setting (0.67)
- Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
Reinforcement Learning-Enhanced Procedural Generation for Dynamic Narrative-Driven AR Experiences
Procedural Content Generation (PCG) is widely used to create scalable and diverse environments in games. However, existing methods, such as the Wave Function Collapse (WFC) algorithm, are often limited to static scenarios and lack the adaptability required for dynamic, narrative-driven applications, particularly in augmented reality (AR) games. This paper presents a reinforcement learning-enhanced WFC framework designed for mobile AR environments. By integrating environment-specific rules and dynamic tile weight adjustments informed by reinforcement learning (RL), the proposed method generates maps that are both contextually coherent and responsive to gameplay needs. Comparative evaluations and user studies demonstrate that the framework achieves superior map quality and delivers immersive experiences, making it well-suited for narrative-driven AR games. Additionally, the method holds promise for broader applications in education, simulation training, and immersive extended reality (XR) experiences, where dynamic and adaptive environments are critical.
- North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
- North America > United States > Washington > King County > Renton (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.46)
Self-supervised 6-DoF Robot Grasping by Demonstration via Augmented Reality Teleoperation System
Dengxiong, Xiwen, Wang, Xueting, Bai, Shi, Zhang, Yunbo
Most existing 6-DoF robot grasping solutions depend on strong supervision on grasp pose to ensure satisfactory performance, which could be laborious and impractical when the robot works in some restricted area. To this end, we propose a self-supervised 6-DoF grasp pose detection framework via an Augmented Reality (AR) teleoperation system that can efficiently learn human demonstrations and provide 6-DoF grasp poses without grasp pose annotations. Specifically, the system collects the human demonstration from the AR environment and contrastively learns the grasping strategy from the demonstration. For the real-world experiment, the proposed system leads to satisfactory grasping abilities and learning to grasp unknown objects within three demonstrations.
- North America > United States > New York > Monroe County > Rochester (0.04)
- North America > United States > California > Santa Clara County > Sunnyvale (0.04)
FVA: Modeling Perceived Friendliness of Virtual Agents Using Movement Characteristics
Randhavane, Tanmay, Bera, Aniket, Kapsaskis, Kyra, Gray, Kurt, Manocha, Dinesh
We present a new approach for improving the friendliness and warmth of a virtual agent in an AR environment by generating appropriate movement characteristics. Our algorithm is based on a novel data-driven friendliness model that is computed using a user-study and psychological characteristics. We use our model to control the movements corresponding to the gaits, gestures, and gazing of friendly virtual agents (FVAs) as they interact with the user's avatar and other agents in the environment. We have integrated FVA agents with an AR environment using with a Microsoft HoloLens. Our algorithm can generate plausible movements at interactive rates to increase the social presence. We also investigate the perception of a user in an AR setting and observe that an FVA has a statistically significant improvement in terms of the perceived friendliness and social presence of a user compared to an agent without the friendliness modeling. We observe an increment of 5.71% in the mean responses to a friendliness measure and an improvement of 4.03% in the mean responses to a social presence measure.
- North America > United States > North Carolina (0.04)
- Oceania > Australia (0.04)
- North America > United States > Virginia > Alexandria County > Alexandria (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (0.87)